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Abstract 

Validation and verification (V&V) are procedures 
used to evaluate system structure or behavior with 
respect to a set of requirements. Although expert 
systems are often developed as a series of prototypes 
without requirements, it is not possible to perform 
V&V on any system for which requirements have not 
been prepared. In addition, there are special prob- 
lems associated with the evaluation of expert systems 
that do not arise in the evaluation of conventional 
systems, such as verification of the completeness and 
accuracy of the knowledge base. The criticality of 
most National Aeronautics and Space Administration 
(NASA) missions makes it important to be able to 
certify the performance of the expert systems used to 
support these missions. This paper presents recom- 
mendations for the most appropriate methods for 
integrating V&V into the Expert System Develop- 
ment Methodology (ESDM) and suggestions for the 
most suitable approaches for each stage of ESDM 
development. 

Introduction 

Expert systems are mechanizations of the cognitive 
problem-solving capabilities of human experts. At the 
outset of expert system development, it is not known 
whether it is even possible to model and mechanize 
the mental processes of human experts, much less 
how long it would take and how much it would cost to 
do so. The procedures used to develop expert systems 
are directed toward extracting from the expert the 
knowledge and skills used in an expert task and 
toward preparing a series of successively more 
refined prototypes that mimic the behavior of the 
human expert. Expert systems tend to evolve rather 
than follow a planned course of development, and 
their life cycle differs significantly from the life cycle 
of conventional systems. 

Because of the differences in the life cycles of expert 
and conventional systems and the need for sound 
management of expert system development projects, 
the Data Systems Technology Division of the 
National Aeronautics and Space Administration/ 
Goddard Space Flight Center (NASA/GSFC) 4 years 


ago undertook a program to formulate a methodology 
for expert systems to be developed at GSFC. The 
result of this effort was the Expert System Develop- 
ment Methodology (ESDM) (CSCa, 1988; CSCb, 
1988; CSCc, 1988; Sary et al., 1990; CSC, 1989; 
Gilstrap, 1990). 

Briefly, ESDM is a risk-driven development method- 
ology. Areas of risk in the development are identified, 
and work on any one phase is directed toward reduc- 
ing the next highest remaining risk. Risks due to 
uncertainty are inherent in any system development 
but are greater at the outset of expert system develop- 
ment because of the unknowns in the modeling of the 
human expert’s cognitive behavior. 

The system life cycle in ESDM is divided into five 
stages, each of which is further divided into five steps 
as shown in the spiral model in Figure 1. This spiral 
model is similar to that recommended by Boehm 
(1986) for conventional systems. The Boehm spiral 
model is also used for expert systems by Stachowitz 
and Combs (1987), Stachowitz et al. (1988), O’Keefe 
and Lee (1990), and others. 

At the time the ESDM task was initiated, it was 
decided that the establishment of procedures and 
methods for performing validation and verification 
(V&V) would be deferred. The original focus was on 
methodology; however, V&V are concerned with 
both methodological and technical issues. Because of 
the criticality of NASA missions, it is necessary to be 
able to certify the behavior of expert systems and to 
verify their performance. A study, which is the basis 
for this paper, was undertaken to determine the steps 
necessary to integrate V&V into ESDM. 

As pointed out by Green and Keyes (1987), V&V has 
seldom been done for expert systems. The primary 
reason is that requirements analyses are not always 
done for expert systems. Without requirements, it is, 
by definition, not possible to perform V&V. A sec- 
ondary reason is that there are special problems, dis- 
cussed in more detail later, in doing V&V for expert 
systems, even when requirements exist. 

ESDM recommends that requirements be prepared 
in all expert system developments, but does not man- 
date them. However, requirements should not be 
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prepared until the major uncertainties about the sys- 
tem have been resolved. Topically, this does not occur 
until after the field stage of work has been completed. 

Validation and Verification 

The terms validation , verification, and testing are 
defined as follows in the Institute for Computer 
Sciences and Technology (ICST) special publication 
on V,V&T (1988): 

“ Validation , Verification and Testing (V,V&T): 

A process of review, analysis, and testing 
employed throughout the software develop- 
ment lifecycle..: which helps ensure the pro- 
duction of quality software. 

“ Validation : The determination of the cor- 
rectness of the end product (code) with 
respect to the software requirements, i.e., 
does the output conform with what is 
required? 


“ Verification : The determination that each 
phase and subphase of the development life- 
cycle is correct, complete, and consistent 
with itself and with its predecessor product. 

“ Testing : The examination of program 
behavior by executing the program on sam- 
ple or operational data sets to determine the 
correctness of the program.” 

Verification is more concerned with the structure or 
form of a system, while validation is more concerned 
with behavior. In simple terms, as Boehm (1984) 
expresses it, in verification we ask 

“Am I building the product right?” 

and in validation we ask 

“Am I building the right product?” 
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Green and Keyes (1987) point out that verification is, 
in many cases, a “paper” activity, while validation is a 
“live” activity involving testing. In conventional sys- 
tems, validation and verification tend to find different 
types of errors; they are complementary processes, 
and neither is sufficient by itself to uncover all the 
errors in a system. 

Testing and Inspection 

The primary tools used in V&V are testing and 
inspection. The purpose of testing is to find errors, 
omissions, or unnecessary elements in a system. The 
most common types of tests used in conventional 
development are 

• Unit tests— these are primarily path tests 
and are directed toward finding gross struc- 
tural errors 

• Integration tests— these are performed each 
time a new module is added to a system being 
developed 

• • System tests— these are performed to 
ensure that a system as constructed meets 
the requirements specified for the system 

• Acceptance tests— these tests are per- 
formed by the customer or user on delivery 
of a system 

Unit and integration tests help to verify that the sys- 
tem has been built correctly. System tests validate the 
system with respect to requirements. The validation 
of requirements (against the real world) may require 
special tests to be designed and executed. For exam- 
ple, in some conventional system development, field 
tests under actual conditions might be required to 
validate the system requirements. Such tests would 
be in addition to system tests for ensuring that the 
system satisfies requirements. 

In addition to testing, software system inspection is 
also performed to uncover errors or to verify system 
consistency or correctness. Inspection techniques 
have also been used in developing expert systems, and 
a wide variety of special inspection techniques have 
been designed for expert systems. For example, the 
human expert is given the knowledge base for an 
expert system and asked to review it for errors of 
omission or commission. A number of these methods 
are listed and briefly described in the ESDM user’s 
manual (CSCa, 1988). 


Provisions for Expert System V&V 

Based on literature review and analysis of the prob- 
lem, a methodology must provide for the following in 
order to support V&V: 

• Requirements generation (O’Keefe & Lee, 
1990; Culbert et al.[a], 1987; Gulbert et 
al.[b], 1987; Barrett, 1990; Preece, 1990) 

• V&V planning (Barrett, 1990), defining the 
following: 

— Objective of each test 
— Criteria for each test 

— Data required to measure attainment of 
objective 

— Data acquisition procedures for collect- 
ing required data 

— Analytical procedures for determining 
whether criteria have been met 

• Guidelines (Green & Keyes, 1987) for the 
following: 

— Procedures for tests and inspections 
— Requirements-traceable tests 
— Parameters of system to be tested 

• Special testing of knowledge-based systems, 
as required 

• Automatic testing tools (Stachowitz & 
Combs, 1987; Stachowitz et al., 1988; Gupta 
& Biegel, 1990) to support the following: 

— Test requirements determination and 
documentation 

— Test planning 

— Execution of test and inspection 
procedures 

— Execution of special testing 
— Requirements tracing 

Without requirements, V&V are simply not possible 
in the usuai sense of the term. The cost of preparing 
requirements in some later stage of expert system 
development, after sufficient information has been 
acquired, is considered a good investment. Require- 
ments provide the basis for testing all subsequent 
prototypes and are essential for testing critical appli- 
cations in operational environments. 

The methodology for developing expert systems must 
mandate test planning and provide for independent 
testing and inspection. It must also provide guidelines 
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and specify procedures for performing tests and 
inspections. Experience has shown that for best 
results, testing and implementation of expert systems 
should be independently managed, just as for conven- 
tional systems development. 

Expert systems are not conventional systems, how- 
ever, and we can test expert systems in ways that are 
impossible with conventional systems. The special 
kinds of testing that are possible for expert systems 
exist because of the knowledge base, the different 
types of knowledge representation, and the non- 
procedural methods of reasoning that may be used in 
expert systems. Examples of special tests include 
searching for contradictions in the knowledge base, 
ensuring that explanations given by the system are 
understandable and reasonable, and ensuring that 
the system does not produce patently absurd or illogi- 
cal conclusions. Details of such tests may depend 
strongly on the specific knowledge representation 
used in the system and require more, perhaps much 
more, preparation than do tests to ensure adherence 
to system requirements. 

Most expert systems developers would agree that it is 
desirable to have automated tools for testing, and 
some would argue that it is essential. Stachowitz and 
Combs (1987) state: 

“It is our conjecture that software valida- 
tion can be more easily performed in a 
knowledge-based system environment. In 
such an environment the number of life cycle 
steps is reduced from the traditional four 
(requirements development, specification 
development, design development, code 
development) to just the first two, resulting 
in a considerable reduction of the amount of 
validation work to be performed.” 

Also, Gupta and Biegel (1990) list seven limita- 
tions of manual test planning and execution in sup- 
port of expert system testing, ranging from possible 
lack of objectivity to the “astronomically large” num- 
ber of test cases needed to extensively test a system. 
Certainly the likelihood is quite high that any com- 
plex expert system designed for highly critical 
requirements will require special tests that are not 
available from generic products. An expert system 
development methodology must make provisions for 
evaluating criticality and for developing special test 
procedures in parallel with the system, particularly in 
the high-criticality cases. 


Special Expert System V&V Tests 

Special tests, for which there may be no generic, com- 
mercially available testing tool, are conducted to 
resolve differences among multiple experts and 
to verify the following: 

• Correctness of reasoning 

• Inference engine 

• Knowledge base 

• Correctness and value of the output advice/ 
actions 

• Correctness of explanations 

• Expertise of the expert 

• System boundaries 

In addition to validating or verifying the knowledge 
base, performance of one or more of the above tests 
may be necessary before the system can be certified 
for critical applications. Test design should be consid- 
ered to be just as important as the design of the 
knowledge representation scheme, the inference 
engine design, or the knowledge engineering meth- 
ods used on the project. 

Culbert and colleagues discuss the issues involved in 
performing many of these special kinds of tests in the 
context of NASA systems (a, 1987) and present their 
own expert system development methodology (b, 
1987), one that supports verification and validation. 
The approach they describe makes use of panels of 
domain experts, users, and managers with system 
responsibilities to ensure that all applicable view- 
points are represented, both during development 
work and during inspections (Culbert et al.[b], 1987). 
The life cycle used in this approach consists of four 
phases: 

• Problem definition 
« Initial prototype 

• Expanded prototype 

• Delivery/maintenance 

The emphasis in this approach is not on the use of 
tests, automated or otherwise; rather, it is on ensur- 
ing that all relevant human skills are brought to bear 
in the development process— an important objective, 
and one that should be supported. 

The mechanism used for verifying the correctness of 
knowledge bases depends on the kind of reasoning 
process used in the system. The framework for verify- 
ing knowledge bases that are developed for use with 
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ordinary logic is readily available. For example, reso- 
lution can always be used to mechanize the determi- 
nation of rule-base consistency, although it is not the 
most efficient way. However, knowledge bases devel- 
oped for nonmonotonic reasoning require an exten- 
sion of the procedures developed for monotonic rea- 
soning. Chang and colleagues (1990) discuss this 
problem and indicate how the automated testing tool, 
Expert Systems Validation Associate (EVA), could be 
extended to the nonmonotonic case. 

The long-range goal of EVA was to construct an 
integrated set of generic tools to validate any 
knowledge-based system in any expert system shell 
(Stachowitz et al., 1988). It currently provides 11 tests 
that assist in verifying knowledge-based systems, 
ranging from structure and logic checks to test case 
generation and behavior verification. The addition of 
a capability for handling nonmonotonic logic would 
be a significant enhancement to the original system. 

The problems of validating an expert or of resolving 
differences among multiple experts are of a very dif- 
ferent order from the problems of verifying an expert 
system or features of an expert system. An evaluation 
technique called the analytic hierarchy process (AHP), 
developed by Saaty (1980), was adapted by Liebowitz 
(1985) for evaluation of expert systems. The AHP 
technique makes use of pairwise subjective evalua- 
tions (Is A more important than B?) to achieve an 
integrated, global evaluation and ranking of many 
factors. A key feature of the method is that it is 
tolerant of intransitivity of relations (team A beats 
team B, which beats team C, which then beats team 
A). This technique can also be used to resolve the 
differences among multiple experts. 

ESDM Modifications 

The GSFC Expert System Development Methodol- 
ogy (ESDM), represented by the spiral model of 
Figure 1, currently supports some of the provisions of 
a methodology needed for V&V. The following lists 
the changes to ESDM needed to more fully support 
V&V: 

• Requirements generation. ESDM does not 
now mandate the preparation of require- 
ments. ESDM will be modified to make 
requirements mandatory in all high- 
criticality projects to ensure that V&V are 
possible. 

• V&V planning. ESDM currently requires the 
design of tests to check for stage completion. 
ESDM will be modified to specify V&V for 
all systems for which requirements are 


developed. ESDM documentation require- 
ments will also be modified to make the test 
plan a formal project document. 

• Guidelines for procedures, tests, test param- 
eters. ESDM will be modified to provide 
guidelines for tests and testing procedures 
that can be used. The choice of tests for a 
particular project will be left to the project 
manager and the knowledge engineer. 

• Special testing. ESDM will provide a checklist 
of special tests. This checklist will include 
the type of knowledge representation, the 
type of inferencing method, and the lan- 
guage or shell that is appropriate for each 
special test. 

• Automatic testing tools. For critical projects, 
ESDM will specify an analysis of automated 
tool requirements. ESDM will also provide 
guidelines for estimating the costs, labor, 
and schedule to develop such automated 
tools in parallel with the expert system. 

The above are the primary changes to ESDM needed 
to support V&V. ESDM will not mandate the specific 
tests that must be performed, but will provide guide- 
lines that can be used by project managers and devel- 
opers. In the future, the guidelines in ESDM are 
expected to be replaced by standards. At present, 
there is not sufficient experience with V&V in any 
expert system methodology to be able to define such 
standards. Ib force standards at this early stage might 
unduly restrict the technology of knowledge-based 
systems. 

Evaluation of Expert Systems 

Testing is a special case of evaluation of systems. As 
discussed by Liebowitz (1985), evaluations are helpful 
in determining whether an expert system is meeting 
its intended goals. The main difference between eval- 
uation and testing is that evaluations are performed 
with respect to a Specific set of objectives or purposes, 
whereas tests are evaluated with regard to correct- 
ness or performance. Evaluation usually requires the 
use of subjective judgment with respect to the objec- 
tives or purposes of the system: 

“In your judgment, is system X suitable for 

application (or purpose) Y?” 

The procedure for making evaluations is more elab- 
orate than is implied by the above question. Usually, 
the features or characteristics of the system are 
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evaluated one at a time, and an overall evaluation is 
obtained by integrating or producing a weighted aver- 
age of the estimates of suitability over all the features. 
In V&V, the objectives or purposes are specified by 
the requirements of the system, and subjective judg- 
ment is reduced to a pass/fail decision. The overall 
global decision to accept or reject is usually based on 
the percentage of cases that pass. For expert systems, 
it is not reasonable to require 100-percent success for 
all tests, because a human expert does not succeed 
100 percent of the time. 

The special tests that can be constructed for expert 
systems correspond to a set of pass/fail tests for the 
properties, characteristics, or features of expert sys- 
tems. For evaluations of all sorts, it is recommended 
that the features selected for evaluation be both inde- 
pendent and mutually exhaustive of all relevant char- 
acteristics (i.e., they should cover all of the important 
properties of the system). A checklist of such features 
relevant to the evaluation of a potential expert system 
for suitability and worth was included in ESDM for 
use with the Tbst for Application of Risk-Oriented 
Technology (TAROT). 

A number of authors have developed lists of expert 
system features, including Liebowitz (1985), Marcot 
(1987), Boehm et al. (1978), and Stachowitz and 
Combs (1987). The following list of system character- 
istics was obtained by consolidating and then elimi- 
nating duplications from these other lists. It is more 
general than the current ESDM list of system charac- 
teristics in that it covers more factors than those 
needed for evaluation of project suitability. It will 
replace the current ESDM list. 

• Structural parameters 
— Completeness 
— Comprehensibility 
— Conciseness 

— Correctness (freedom from contradic- 
tion) 

— Consistency (freedom from anomaly) 

— Legibility 
— Modularity 
— Self-descriptiveness 
— Simplicity 
— Structuredness 
— Understandability 


• Behavioral parameters 
— Utility 

-- Accuracy 
-- Effectiveness 

- - Correctness (correctness of output) 
— Credibility 
— Intelligibility 
— Ease of use 
— Producibility 
-- Speed 
— Operability 
-- Efficiency 
— Flexibility 
-- Interoperability 
-- Maintainability 
-- Portability 
— Reliability 
-- Reusability 
-- Robustness 
-- Sensitivity 
-- Stability 
— Quality 

-- Completeness (breadth, depth) 

-- Conciseness 
-- Consistency 
-- Correctness 
-- Integrity 
-- Maintainability 
-- Reliability 
-- Testability 
— Suitability 
-- Worth 
-- Risk 
-- Benefits 
-- Costs 

-- Urgency /priority 
-- Criticality 

The special tests needed for a given expert system 
project are all related to one or more of the structural 
and behavioral characteristics listed above. System 
developers can use the checklist to help select those 
features that are critical in their application. Once 
selected, special tests must then be developed to 
examine each of the features or characteristics. 

Summary 

Although V&V are difficult to accomplish for expert 
systems, the main problem has been that require- 
ments have, in many cases, been missing. Without 
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requirements, V&V are not possible. However, even 
when requirements are available, performing V&V 
on expert systems may not be easy because of the 
large number of different types of tests and inspec- 
tions that are possible for such systems. 

ESDM, the GSFC methodology for development of 
expert systems, is being upgraded to provide guide- 
lines and checklists that assist managers and develop- 
ers in planning, documenting, and performing V&V 
tests and inspections to certify expert systems for use 
in critical applications. 

There is no one, simple, magic solution to the 
achievement of reliable, bug-free, certifiable soft- 
ware of any kind. High-quality, reliable software 
requires many different types of tests and inspections, 
as well as adherence to sound design and coding prac- 
tices. The best way to achieve quality expert systems is 
for project managers and developers to be aware of 
the options available for testing and inspection and to 
plan on using the appropriate tools at the right time. 
ESDM modifications support these objectives, 
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